Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38562882

RESUMO

Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cell fate in developmental systems. However, identifying the molecular hallmarks of potency - the capacity of a cell to differentiate into other cell types - has remained challenging. Here, we introduce CytoTRACE 2, an interpretable deep learning framework for characterizing potency and differentiation states on an absolute scale from scRNA-seq data. Across 31 human and mouse scRNA-seq datasets encompassing 28 tissue types, CytoTRACE 2 outperformed existing methods for recovering experimentally determined potency levels and differentiation states covering the entire range of cellular ontogeny. Moreover, it reconstructed the temporal hierarchy of mouse embryogenesis across 62 timepoints; identified pan-tissue expression programs that discriminate major potency levels; and facilitated discovery of cellular phenotypes in cancer linked to survival and immunotherapy resistance. Our results illuminate a fundamental feature of cell biology and provide a broadly applicable platform for delineating single-cell differentiation landscapes in health and disease.

2.
Pac Symp Biocomput ; 29: 492-505, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38160302

RESUMO

Subcellular protein localization is important for understanding functional states of cells, but measuring and quantifying this information can be difficult and typically requires high-resolution microscopy. In this work, we develop a metric to define surface protein polarity from immunofluorescence (IF) imaging data and use it to identify distinct immune cell states within tumor microenvironments. We apply this metric to characterize over two million cells across 600 patient samples and find that cells identified as having polar expression exhibit characteristics relating to tumor-immune cell engagement. Additionally, we show that incorporating these polarity-defined cell subtypes improves the performance of deep learning models trained to predict patient survival outcomes. This method provides a first look at using subcellular protein expression patterns to phenotype immune cell functional states with applications to precision medicine.


Assuntos
Biologia Computacional , Proteômica , Humanos , Proteômica/métodos
3.
PNAS Nexus ; 2(6): pgad171, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37275261

RESUMO

Multiplex immunofluorescence (mIF) assays multiple protein biomarkers on a single tissue section. Recently, high-plex CODEX (co-detection by indexing) systems enable simultaneous imaging of 40+ protein biomarkers, unlocking more detailed molecular phenotyping, leading to richer insights into cellular interactions and disease. However, high-plex data can be slower and more costly to collect, limiting its applications, especially in clinical settings. We propose a machine learning framework, 7-UP, that can computationally generate in silico 40-plex CODEX at single-cell resolution from a standard 7-plex mIF panel by leveraging cellular morphology. We demonstrate the usefulness of the imputed biomarkers in accurately classifying cell types and predicting patient survival outcomes. Furthermore, 7-UP's imputations generalize well across samples from different clinical sites and cancer types. 7-UP opens the possibility of in silico CODEX, making insights from high-plex mIF more widely available.

4.
Physiol Rev ; 103(4): 2423-2450, 2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37104717

RESUMO

Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.


Assuntos
Inteligência Artificial , Atenção à Saúde , Humanos
5.
Nat Rev Cancer ; 23(8): 508, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37117275
6.
Nat Biomed Eng ; 6(12): 1435-1448, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36357512

RESUMO

Multiplexed immunofluorescence imaging allows the multidimensional molecular profiling of cellular environments at subcellular resolution. However, identifying and characterizing disease-relevant microenvironments from these rich datasets is challenging. Here we show that a graph neural network that leverages spatial protein profiles in tissue specimens to model tumour microenvironments as local subgraphs captures distinctive cellular interactions associated with differential clinical outcomes. We applied this spatial cellular-graph strategy to specimens of human head-and-neck and colorectal cancers assayed with 40-plex immunofluorescence imaging to identify spatial motifs associated with cancer recurrence and with patient survival after treatment. The graph deep learning model was substantially more accurate in predicting patient outcomes than deep learning approaches that model spatial data on the basis of the local composition of cell types, and it generated insights into the effect of the spatial compartmentalization of tumour cells and granulocytes on patient prognosis. Local graphs may also aid in the analysis of disease-relevant motifs in histology samples characterized via spatial transcriptomics and other -omics techniques.


Assuntos
Aprendizado Profundo , Humanos , Microambiente Tumoral , Redes Neurais de Computação , Perfilação da Expressão Gênica/métodos
7.
Mol Biol Cell ; 33(6): ar59, 2022 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-35138913

RESUMO

A cell's shape and motion represent fundamental aspects of cell identity and can be highly predictive of function and pathology. However, automated analysis of the morphodynamic states remains challenging for most cell types, especially primary human cells where genetic labeling may not be feasible. To enable automated and quantitative analysis of morphodynamic states, we developed DynaMorph-a computational framework that combines quantitative live cell imaging with self-supervised learning. To demonstrate the robustness and utility of this approach, we used DynaMorph to annotate morphodynamic states observed with label-free measurements of optical density and anisotropy of live microglia isolated from human brain tissue. These cells show complex behavior and have varied responses to disease-relevant perturbations. DynaMorph generates quantitative morphodynamic representations that can be used to compare the effects of the perturbations. Using DynaMorph, we identify distinct morphodynamic states of microglia polarization and detect rare transition events between states. The concepts and the methods presented here can facilitate automated discovery of functional states of diverse cellular systems.


Assuntos
Encéfalo , Aprendizado de Máquina Supervisionado , Anisotropia , Humanos
8.
J Proteomics ; 223: 103820, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-32416316

RESUMO

Mass spectrometry (MS) based proteomics has become an indispensable component of modern molecular and cellular biochemistry analysis. Multiple reaction monitoring (MRM) is one of the most well-established MS techniques for molecule detection and quantification. Despite its wide usage, there lacks an accurate computational framework to analyze MRM data, and expert annotation is often required, especially to perform peak integration. Here we propose a deep learning method PB-Net (Peak Boundary Neural Network), built upon recent advances in sequential neural networks, for fully automatic chromatographic peak integration. To train PB-Net, we generated a large dataset of over 170,000 expert annotated peaks from MS transitions spanning a wide dynamic range, including both peptides and intact glycopeptides. Our model demonstrated outstanding performances on unseen test samples, reaching near-perfect agreement (Pearson's r 0.997) with human annotated ground truth. Systematic evaluations also show that PB-Net is substantially more robust and accurate compared to previous state-of-the-art peak integration software. PB-Net can benefit the wide community of mass spectrometry data analysis, especially in applications involving high-throughput MS experiments. Codes and test data used in this work are available at https://github.com/miaecle/PB-net. SIGNIFICANCE: Human annotations serve an important role in accurate quantification of multiple reaction monitoring (MRM) experiments, though they are costly to collect and limit analysis throughput. In this work we proposed and developed a novel technique for the peak-integration step in MRM, based on recent innovations in sequential deep learning models. We collected in total 170,000 expert-annotated MRM peaks and trained a set of accurate and robust neural networks for the task. Results demonstrated a substantial improvement over the current state-of-the-art software for mass spectrometry analysis and comparable level of accuracy and precision as human annotators.


Assuntos
Aprendizado Profundo , Humanos , Espectrometria de Massas , Peptídeos , Proteômica , Software
9.
Bioinformatics ; 36(16): 4440-4448, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32330225

RESUMO

SUMMARY: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY AND IMPLEMENTATION: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Mapeamento Cromossômico , Código das Histonas , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas/genética
10.
Nat Biotechnol ; 37(9): 1034-1037, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31359007

RESUMO

Understanding of repair outcomes after Cas9-induced DNA cleavage is still limited, especially in primary human cells. We sequence repair outcomes at 1,656 on-target genomic sites in primary human T cells and use these data to train a machine learning model, which we have called CRISPR Repair Outcome (SPROUT). SPROUT accurately predicts the length, probability and sequence of nucleotide insertions and deletions, and will facilitate design of SpCas9 guide RNAs in therapeutically important primary human cells.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes/métodos , RNA Guia de Cinetoplastídeos/genética , Linfócitos T/fisiologia , Linhagem Celular , Regulação da Expressão Gênica , Genoma , Genômica , Humanos , Células-Tronco Pluripotentes Induzidas/fisiologia
11.
ACS Cent Sci ; 4(11): 1520-1530, 2018 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-30555904

RESUMO

The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. The key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning-instead of feature engineering-deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor EFχ (R), to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.

12.
Chem Sci ; 9(2): 513-530, 2018 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-29629118

RESUMO

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

13.
J Chem Inf Model ; 57(8): 2068-2076, 2017 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-28692267

RESUMO

Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these data sets into train/valid/test using time and neighbor splits to test multitask deep learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep networks are ready for widespread use in commercial drug discovery.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Absorção de Radiação , Concentração Inibidora 50 , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Serina Proteinase/química , Inibidores de Serina Proteinase/farmacologia , Software , Raios Ultravioleta
14.
J Phys Chem B ; 120(45): 11674-11682, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27775360

RESUMO

Fluorescence correlation spectroscopy (FCS) is a powerful tool to investigate molecular diffusion and relaxations, which may be utilized to study many problems such as molecular size and aggregation, chemical reaction, molecular transportation and motion, and various kinds of physical and chemical relaxations. This article focuses on a problem related to using the relaxation term to study a reaction. If two species with different fluorescence photon emission efficiencies are connected by a reaction, the kinetic and equilibrium properties will be manifested in the relaxation term of the FCS curve. However, the conventional FCS alone cannot simultaneously determine the equilibrium constant (K) and the relative fluorescence brightness (Q), both of which are indispensable in the extraction of thermodynamic and kinetic information from the experimental data. To circumvent the problem, an assumption of Q = 0 is often made for the weak fluorescent species, which may lead to numerous errors when the actual situation is not the case. We propose to combine the third-order FCS with the conventional second-order FCS to determine K and Q without invoking other resources. The strategy and formalism are verified by computer simulations and demonstrated in a classical example of the hairpin DNA-folding process.


Assuntos
DNA/química , Cinética , Espectrometria de Fluorescência , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...